Data science, intelligence and future analysis
Monireh Hosseini; Elnaz Galavi
Abstract
Community detection is an important topic for social network analysis and is also essential to understanding complex networks structure. In community detection, the goal is to determine the groups in which the group nodes are densely connected to each other. In this research, deep learning techniques ...
Read More
Community detection is an important topic for social network analysis and is also essential to understanding complex networks structure. In community detection, the goal is to determine the groups in which the group nodes are densely connected to each other. In this research, deep learning techniques have been used to control graph data with high dimensions, while presenting a comprehensive and integrated architecture of community recognition methods with deep learning. Community detection classic approaches are suitable for networks with low dimensions. Therefore, the reduction of complex network dimensions is counted as a significant topic in community detection. In this paper, in order to reveal the direct and indirect connections among nodes, first a new similarity matrix of network topology is built. Then, a stacked auto-encoder is designed to decrease dimensions based on unsupervised learning. In order to detect communities, various clustering algorithms are then tested and utilized. Evaluation of the proposed research model is performed by surveying various experiments on standard criteria and six real data sets of Karate, Dolphins, Football, Polbooks, Cora and Citeseer. The proposed method evaluation outcomes show a higher accuracy in the identification of communities in the football data set compared to the twelve proposed algorithms used in past researches, and show a significant improvement in other data sets compared to the thirteen algorithms.
Introduction
Today, due to the increasing use of the Internet, social networks have found an important role in the real life of people. In social networks, some nodes are more connected than the entire network nodes, which are called communities(Sperli, 2019). Community Detection is an important topic for social network analysis and is also essential to understanding complex network structure In community detection, the goal is to determine the groups in which the group nodes are densely connected.
There are many methods for community detection, but deep learning has shown excellent performance in a wide range of research fields, such as social networks, graph embedding, etc.
In this research, deep learning techniques have been used to control graph data with high dimensions, while presenting a comprehensive and integrated architecture of community detection methods with deep learning.
Research Questions
Is it possible to create a new similarity matrix from the graph of complex networks that fully reveals the similarity relationships between network nodes?
What is the appropriate method of deep learning to represent the features of complex networks in low dimensions?
Is it possible to provide a suitable framework with model flexibility for networks of different sizes for community detection using the deep learning method?
Can more accurate clustering results be achieved for community detection?
Literature Review
2.1.Community detection classic approaches are suitable for networks with low dimensions. Therefore, the reduction of complex network dimensions is counted as a significant topic in community detection. The disadvantage of the high-dimensional network is the huge computational costs incurred by community detection methods. Therefore, a method is needed to transform high-dimensional graphs into a lower-dimensional space, where important information about network structure and node properties is still preserved. According to past research, autoencoders are the dominant method for mapping data points in lower-dimensional spaces (Souravlas et al, 2021).
2.2.To display the network, using the proximity matrix as the network similarity matrix can describe the similarity relationship between the nodes in the network. But the relationship between nodes in a social network is complex. On the other hand, in addition to the similarity between nodes that are directly connected, there are different degrees of similarity between nodes that are not directly connected (Su et al., 2020).
2.3. Wu et al. (2020) and Geng et al. (2020) reconstructed the adjacency matrix to represent the network. Dhilber and Bhavani (2020) used a cubic matrix for the input of the stack autoencoders, as did the work of Yang et al. (2016). Xie et al. (2018) first proposed a new representation of network similarity and then fed it with a sparse filtering model to extract meaningful features of network nodes. But in addition to the problem of lack of neighbor information in the proximity matrix based on Su et al.'s (2020) research, using only one function to check the similarity between nodes cannot fully reveal the topological information of the network. Therefore, a similarity matrix should be presented that can solve the proposed gaps.
Methodology
In this paper, to reveal the direct and indirect connections among nodes, first, a new similarity matrix of network topology is built. To construct the new similarity matrix, two matrices are used, i.e. proximity matrix and S∅rensen–Dice's (S∅) similarity matrix in Xie et al. (2018) 's research. In the next step to extract low-dimensional graph features, the new similarity matrix is given as input to the stack autoencoder networks, which have several hidden layers for unsupervised training. Then, using the newly learned features that are in the low-dimensional matrix with the help of K-means, DBSCAN, and SNNDPC clustering algorithms, communities are detected.
Conclusion
Evaluation of the proposed research model is performed by surveying various experiments on standard criteria and six real data sets of Karate, Dolphins, Football, Polbooks, Cora, and Citeseer. The proposed method evaluation outcomes show a higher accuracy in the detection of communities in the football data set compared to the twelve proposed algorithms used in past research and show a significant improvement in other data sets compared to the thirteen algorithms. In addition to these cases, the superiority of the similarity matrix used in this research was proved as a key prerequisite for community detection.
Keywords: Community Detection, Deep Learning, Autoencoder, Complex Networks.